Add support for running vLLM by amaslenn · Pull Request #799 · NVIDIA/cloudai

amaslenn · 2026-02-10T12:39:41Z

Summary

Two modes are supported at the moment, single node only:

Disaggregated run.
Non-disaggregated run.

Test Plan

CI (extended)
Manual runs.

Additional Notes

–

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…eval_strategy.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@doc/workloads/vllm.rst`:
- Around line 92-93: The disaggregated TOML snippet is ambiguous because
[extra_env_vars] is shown at top level without context; update the example in
vllm.rst to show a complete TOML block including the top-level keys (e.g., name,
test_template_name, executor/test settings) and explicitly show whether
[extra_env_vars] is a sibling of [cmd_args] or nested under it (for example,
include [cmd_args] with its keys then a separate [extra_env_vars] section), so
readers can unambiguously see the intended section hierarchy and placement of
CUDA_VISIBLE_DEVICES.
- Line 73: Replace the awkward phrase "from less priority to more priority" in
the sentence "The number of GPUs can be controlled using the options below,
listed from less priority to more priority:" with a clearer alternative such as
"from lowest to highest priority" or "in order of increasing priority" so the
sentence reads e.g. "The number of GPUs can be controlled using the options
below, listed from lowest to highest priority:"; update the string where that
sentence appears in the vllm.rst documentation.

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 47-48: The use of functools.cache on parse_vllm_bench_output
causes indefinite memoization by Path and can return stale results if the file
changes; update the function to either remove the `@cache` decorator or change the
cache key to include the file's modification state (e.g., use an explicit
memoization keyed by (res_file, res_file.stat().st_mtime) or a TTL/lru cache) so
cached entries are invalidated when the file is updated; locate
parse_vllm_bench_output and replace the `@cache` usage with one of these
strategies to ensure fresh results for changed files.
- Around line 53-58: The except clause in the block that opens res_file and
calls VLLMBenchReport.model_validate(data) is redundant because
json.JSONDecodeError is already an Exception; update the except clause from
"except (json.JSONDecodeError, Exception):" to a single "except Exception:" so
it no longer lists duplicate exception types while preserving the current error
handling behavior.

In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 258-268: The script launches the proxy in background (proxy_cmd,
PROXY_PID) and immediately starts the benchmark (bench_cmd), causing potential
failures if the proxy isn't ready; update the generated shell to wait for proxy
readiness by invoking the existing wait_for_health helper (or a short sleep)
against the proxy endpoint after starting the proxy and before running
bench_cmd, ensuring the health check references the same proxy port/URL used by
proxy_cmd and still retains PROXY_PID handling.

In `@tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py`:
- Around line 55-60: The fixture vllm_disagg_tr mutates the shared vllm fixture;
instead create a fresh VllmTestDefinition instance (or deep copy the existing
vllm) inside vllm_disagg_tr, set its extra_env_vars =
{"CUDA_VISIBLE_DEVICES":"0,1,2,3"} and its cmd_args.prefill = VllmArgs() on that
new instance, then pass the new instance to TestRun(test=...) so vllm remains
unchanged; reference the vllm_disagg_tr fixture, VllmTestDefinition (or use
copy.deepcopy(vllm)), TestRun, and VllmArgs when making the change.

doc/workloads/vllm.rst

src/cloudai/workloads/vllm/report_generation_strategy.py

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

greptile-apps · 2026-02-10T13:39:30Z

Greptile Overview

Greptile Summary

This PR adds comprehensive vLLM support to CloudAI with both aggregated (single instance) and disaggregated (separate prefill/decode) modes for single-node execution.

Key Changes:

Implemented VllmTestDefinition with flexible GPU assignment via CUDA_VISIBLE_DEVICES or explicit gpu_ids
Created command generation strategy handling disaggregated mode with prefill/decode instances, proxy coordination, and health checks
Added report generation parsing JSON benchmark results for TTFT and TPOT metrics
Comprehensive test coverage for GPU detection, command generation, and both execution modes
Documentation with clear examples for configuration

Critical Issues:

Benchmark command formatting bug in slurm_command_gen_strategy.py:116-126 embeds arguments as --model {value} strings instead of separate list items, which will cause shell execution failures
Extra args formatting on line 111 creates single-string args instead of separate list elements
Log parsing on vllm.py:130 uses fragile split()[2] without validation

Architecture:
The disaggregated mode launches prefill and decode vLLM instances on different GPU sets, uses NixlConnector for KV cache transfer, coordinates via a proxy server, and waits for health checks before running benchmarks.

Confidence Score: 2/5

This PR has critical command formatting bugs that will cause benchmark execution failures in production
The benchmark command construction embeds arguments within strings (--model {value}) instead of as separate list elements, which will break when the command is executed by the shell. This affects core functionality and will prevent benchmarks from running. While the overall architecture is sound and tests are comprehensive, these execution bugs are blocking issues.
Pay close attention to src/cloudai/workloads/vllm/slurm_command_gen_strategy.py - the command formatting bugs must be fixed before merge

Important Files Changed

Filename	Overview
src/cloudai/workloads/vllm/vllm.py	Core vLLM implementation with argument models and success detection logic. Has a fragile log parsing issue that could fail with format variations.
src/cloudai/workloads/vllm/slurm_command_gen_strategy.py	Command generation for both aggregated and disaggregated modes. Critical bug in benchmark command formatting where arguments are embedded in strings instead of being separate list items.
src/cloudai/workloads/vllm/report_generation_strategy.py	Report generation from JSON output with clean error handling and clear metrics display.
tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py	Comprehensive test coverage for GPU detection, command generation, and both aggregated/disaggregated modes.

Sequence Diagram

sequenceDiagram
    participant User
    participant CloudAI
    participant Slurm
    participant Container
    participant vLLM_Prefill
    participant vLLM_Decode
    participant Proxy
    participant Benchmark

    User->>CloudAI: Submit vLLM test (disaggregated mode)
    CloudAI->>Slurm: Generate sbatch script
    Slurm->>Container: Start prefill instance with CUDA_VISIBLE_DEVICES
    Container->>vLLM_Prefill: vllm serve --kv-transfer-config (producer)
    Slurm->>Container: Start decode instance with CUDA_VISIBLE_DEVICES
    Container->>vLLM_Decode: vllm serve --kv-transfer-config (consumer)
    
    CloudAI->>vLLM_Prefill: Health check /health endpoint
    vLLM_Prefill-->>CloudAI: Ready
    CloudAI->>vLLM_Decode: Health check /health endpoint
    vLLM_Decode-->>CloudAI: Ready
    
    Slurm->>Container: Start proxy server
    Container->>Proxy: python3 toy_proxy_server.py
    Proxy->>vLLM_Prefill: Connect to prefill port
    Proxy->>vLLM_Decode: Connect to decode port
    
    Slurm->>Container: Run benchmark
    Container->>Benchmark: vllm bench serve
    Benchmark->>Proxy: Send requests to port 8000
    Proxy->>vLLM_Prefill: Forward prefill requests
    vLLM_Prefill->>vLLM_Decode: Transfer KV cache via NixlConnector
    vLLM_Decode->>Proxy: Return generated tokens
    Proxy->>Benchmark: Return responses
    Benchmark->>Container: Write results to vllm-bench.json
    Container-->>CloudAI: Job complete
    CloudAI->>User: Generate report with metrics

greptile-apps

_{14 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-10T13:39:41Z

Additional Comments (2)

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py
Incorrect argv tokenization

get_vllm_bench_command() is returning items like "--model {cmd_args.model}" and extras like "--extra 1" as single list elements (later " ".join(...)). When this gets executed, flags/values won’t be passed as distinct argv tokens and any value containing spaces (or needing quoting) will be mis-parsed. This will break benchmark invocation for legitimate inputs.

Consider returning [..., "--model", cmd_args.model, "--base-url", f"http://0.0.0.0:{cmd_args.port}", ...] and for extras extras.extend([f"--{k}", str(v)]) so the script can safely join/execute without re-tokenization.

src/cloudai/workloads/vllm/vllm.py
Brittle success parsing

was_run_successful() parses the successful request count with int(line.split()[2]). If vLLM’s output format changes slightly (extra columns, different spacing, etc.), this will throw and you’ll fall through to reporting failure even when results are present (the exception is swallowed and the loop continues). A small regex like r"Successful requests:\s*(\d+)" (or split from the colon) would make this robust and avoid false negatives.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 31-45: VLLMBenchReport defines std_ttft_ms and std_tpot_ms but
they aren't shown in the generated report table; either remove these fields or
add them to the displayed metrics—update the report-generation code that
currently renders mean_ttft_ms/median_ttft_ms/p99_ttft_ms and
mean_tpot_ms/median_tpot_ms/p99_tpot_ms to also include std_ttft_ms and
std_tpot_ms (add headers, column values and formatting consistent with the other
stats), or delete std_ttft_ms/std_tpot_ms from VLLMBenchReport if intentionally
unused; ensure any serialization/deserialization and tests reference the updated
schema.

src/cloudai/workloads/vllm/report_generation_strategy.py

greptile-apps

_{14 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 64-65: The cache key issue comes from passing potentially
non-normalized Path objects into parse_vllm_bench_output from
can_handle_directory; update can_handle_directory to resolve the path (e.g.,
call self.test_run.output_path.resolve() or resolve() on the
VLLM_BENCH_JSON_FILE path) before passing it to parse_vllm_bench_output so the
cached key is consistent with generate_report and other callers, and likewise
ensure any other call sites (like generate_report) also resolve the path before
invoking parse_vllm_bench_output to avoid inconsistent cache hits/misses.

src/cloudai/workloads/vllm/report_generation_strategy.py

greptile-apps

_{14 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/vllm.py

greptile-apps

_{14 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

greptile-apps

_{15 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

src/cloudai/workloads/vllm/vllm.py

src/cloudai/workloads/vllm/report_generation_strategy.py

doc/workloads/vllm.rst

greptile-apps

_{15 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/vllm.py

src/cloudai/workloads/vllm/report_generation_strategy.py

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

Co-authored-by: Ivan Podkidyshev <raashicat@gmail.com>

greptile-apps

_{15 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/vllm.py

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

amaslenn added 30 commits February 3, 2026 10:19

Add mock vLLM workload

f699535

Add get_vllm_serve_command()

92755ce

Add generate_serve_run_and_wait_block()

2980e23

Initial implementation for _gen_srun_command()

4bdf09c

Add acceptance case

2df1194

Control flow from the sbatch

90b8eaa

Add bench runs

8677a02

Redirect outputs and use --overlap

3f513fa

Update to use docker cache

ea48456

Log steps and less output files

e9080fd

Check if run successfull

1390886

Fix tests

3544898

Prepare for disagg

573a35f

vLLM disagg mode

e57bc77

Add quotation

812c1c5

Add wa for conflicting VLLM_NIXL_SIDE_CHANNEL_PORT

07074d9

Fix port offset value

0ce7b01

Use --export for per-run env vars

cf8cbe1

Support extra args

a55a969

More info on cleanup

ac7e603

Correct list of devices as arg

6052bd7

Better env vars handling

f25e41e

Better check for success

f1ec11d

Update port offset logic

11b00ed

Use ip instead of localhost

dcee8ec

Configure a git repo for proxy script

f85086f

Override prefill/decode gpu list

ce78d85

Updates

bc09add

Configure output files for bench, allow any options

29271b0

Add reporting for vLLM

271df0f

amaslenn and others added 3 commits February 10, 2026 14:04

Update src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

f2ef121

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update tests/job_status_retrieval_strategy/test_vllm_job_status_retri…

eafb045

…eval_strategy.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Make parse_vllm_bench_output safer

a81447e

coderabbitai bot mentioned this pull request Feb 10, 2026

Refactor private attributes in Pydantic models to use PrivateAttr #800

Open

Address review comments

ad9e6b5

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

Update doc/workloads/vllm.rst

cd8c093

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

amaslenn requested a review from podkidyshev February 10, 2026 13:35

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Address review comments

d05b7c7

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/report_generation_strategy.py Show resolved Hide resolved

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Address review comments

8c71afb

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/report_generation_strategy.py Show resolved Hide resolved

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/vllm.py Outdated Show resolved Hide resolved

Address review comments

488d1ee

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

amaslenn added 2 commits February 12, 2026 12:02

Handle possible misconfig in gpu_ids in disagg mode

c7c8bb8

Refactor cmd gen to simplify interfaces

81995f6

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Address review comments

a654a40

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

src/cloudai/workloads/vllm/vllm.py Show resolved Hide resolved

podkidyshev reviewed Feb 12, 2026

View reviewed changes

src/cloudai/workloads/vllm/report_generation_strategy.py Outdated Show resolved Hide resolved

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

Update src/cloudai/workloads/vllm/report_generation_strategy.py

d7060de

Co-authored-by: Ivan Podkidyshev <raashicat@gmail.com>

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

src/cloudai/workloads/vllm/vllm.py Show resolved Hide resolved

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

podkidyshev approved these changes Feb 12, 2026

View reviewed changes

amaslenn merged commit 0e23faa into main Feb 12, 2026
4 checks passed

amaslenn deleted the am/vllm branch February 12, 2026 16:54

Conversation

amaslenn commented Feb 10, 2026

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Feb 10, 2026 •

edited

Loading